A Japanese National Project on Spontaneous Speech Corpus and Processing Technology

نویسندگان

  • Sadaoki Furui
  • Kikuo Maekawa
  • Hitoshi Isahara
چکیده

A new national project for raising the technological level of speech recognition and understanding has recently commenced in Japan. This project aims at a) building a large-scale spontaneous speech corpus consisting of roughly 7M words and 800 hours of speech, b) acoustic and linguistic modeling for spontaneous speech understanding and summarization using linguistic as well as para-linguistic information in speech, and c) building a prototype of a spontaneous speech summarization system. The corpus under compilation will contain spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but it is also planned to become a useful research corpus both for natural language processing and phonetic/linguistic studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steps toward Flexible Speech Recognition – Recent Progress at Tokyo Institute of Technology –

This paper describes recent progress at Tokyo Institute of Technology and the author’s perspectives for making speech recognition systems more flexible at both the acoustic and linguistic processing levels. Specifically, it describes a broadcast news transcription system, a multimodal dialogue system for information retrieval, neural-network-based HMM adaptation for noisy speech, online increme...

متن کامل

Recent Progress in Corpus-Based Spontaneous Speech Recognition

This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance f...

متن کامل

Spontaneous Speech Recognition and Summarization

This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology focusing on various achievements of a Japanese 5-year national project “Spontaneous Speech: Corpus and Processing Technology”. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automat...

متن کامل

Corpus and Text Analysis of Spontaneous Japanese

There are three major parts of the “Spontaneous Speech: Corpus and Processing Technology” project; (1) compilation of large spontaneous speech corpus, (2) establishment of spoken language engineering based on the corpus, and (3) developing a prototype of a spoken language summarization system. This paper describes how we help to develop this large corpus, i.e., (1), using technology developed a...

متن کامل

Recent Advances in Spontaneous Speech Recognition and Understanding

How to recognize and understand spontaneous speech is one of the most important issues in state-of-the-art speech recognition technology. In this context, a five-year large-scale national project entitled “Spontaneous Speech: Corpus and Processing Technology” started in Japan in 1999. This paper gives an overview of the project and reports on the major results of experiments that have been cond...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003